Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support events.preStop #1168

Merged
merged 4 commits into from
Oct 12, 2023
Merged

Support events.preStop #1168

merged 4 commits into from
Oct 12, 2023

Conversation

AObuchow
Copy link
Collaborator

What does this PR do?

Adds support for events.preStop in devworkspaces. Only exec type commands are supported.

Multiple preStop commands may be executed, following a similar format to postStart events, the difference being that the output of the commands are redirected to /proc/1/fd/1 (for stdout) and /proc/1/fd/2 for stderr. This allows reading the output of preStop commands in the kubernetes logs after pod termination.

The webhook warning against the usage of events.preStop was removed, as well as the mention of events.preStop in the unsupported devfile API doc.

What issues does this PR fix or reference?

Fix #1139

Is it tested? How?

Some automated tests (based on the postStart test cases) were added.

To manually test:

  1. Install DWO using the changes from this PR. Make sure the webhook is also built from this PR, otherwise you'll get the old warning about using events.preStop
  2. Apply a DevWorkspace that uses preStop events. I tested with the following:
kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: pre-stop-test
spec:
  started: true
  routingClass: 'basic'
  template:
    projects:
    - name: web-nodejs-sample
      git:
        remotes:
          origin: "https://github.com/che-samples/web-nodejs-sample.git"
    components:
      - name: web-terminal
        container:
          image: quay.io/wto/web-terminal-tooling:next
          memoryLimit: 512Mi
          mountSources: true
          command:
           - "tail"
           - "-f"
           - "/dev/null"
    commands:
    - id: failing-command
      exec:
        component: web-terminal
        commandLine: ./not-a-script.sh
        workingDir: /tmp/
    - id: list-in-project
      exec:
        component: web-terminal
        commandLine: ls -al
        workingDir: ${PROJECT_SOURCE}
    - id: long-command
      exec:
        component: web-terminal
        commandLine: |
          date
          pwd
          sleep 5s
          echo "done sleeping"
        workingDir: $HOME
    - id: output-to-file
      exec:
        component: web-terminal
        commandLine: echo "writing to a file" > output-to-file.log
        workingDir: ${PROJECT_SOURCE}
    - id: multi-line-command
      exec:
        component: web-terminal
        commandLine: |
          echo "date: $(date)" > multi-line-command.log
          pwd >> multi-line-command.log
          sleep 5s
          echo "done sleeping: $(date)" >> multi-line-command.log
        workingDir: $HOME
    - id: background-longrunning
      exec:
        component: web-terminal
        commandLine: |
          nohup sh -c '{
            for i in {1..500}; do
              echo "background: $i"
              sleep 1
            done
          }' &
    events:
      preStop:
        - long-command
        - background-longrunning
        - output-to-file
        - failing-command
        - multi-line-command
        - list-in-project
  1. After the Devworkspace starts up, find the workspace pod's name. In my case, workspace623a8f20aa6442f3-6dccd76799-mqnt5 :
$ kubectl get pods -n $NAMESPACE
NAME                                           READY   STATUS    RESTARTS   AGE
devworkspace-webhook-server-7ff4b8d7bf-szb8f   2/2     Running   0          89s
workspace623a8f20aa6442f3-6dccd76799-mqnt5     1/1     Running   0          63s
  1. Open another terminal and monitor the logs of the workspace pod: kubectl logs -f workspace623a8f20aa6442f3-6dccd76799-mqnt5 -n $NAMESPACE

  2. In another terminal, delete the DevWorkspace: kubectl delete dw pre-stop-test -n $NAMESPACE

  3. In the terminal that you're monitoring the logs, you should see the output of all preStop event commands:

$ kubectl logs -f workspace623a8f20aa6442f3-6dccd76799-mqnt5 -n $NAMESPACE
Defaulted container "web-terminal" out of: web-terminal, project-clone (init)
Wed Aug 23 15:56:13 UTC 2023
/home/user
done sleeping
/bin/sh: line 17: ./not-a-script.sh: No such file or directory
background: 1
background: 2
background: 3
background: 4
background: 5
total 56
drwxr-xr-x. 1 1234 root   240 Aug 23 15:56 .
drwxrwxrwx. 1 root root    34 Aug 23 15:52 ..
drwxr-xr-x. 1 1234 root   158 Aug 23 15:52 .git
-rw-r--r--. 1 1234 root    25 Aug 23 15:52 .gitattributes
drwxr-xr-x. 1 1234 root    20 Aug 23 15:52 .github
-rw-r--r--. 1 1234 root    14 Aug 23 15:52 .gitignore
drwxr-xr-x. 1 1234 root    52 Aug 23 15:52 .vscode
-rw-r--r--. 1 1234 root 14197 Aug 23 15:52 LICENSE
-rw-r--r--. 1 1234 root   449 Aug 23 15:52 README.md
drwxr-xr-x. 1 1234 root    12 Aug 23 15:52 app
-rw-r--r--. 1 1234 root  1384 Aug 23 15:52 devfile.yaml
-rw-r--r--. 1 1234 root    18 Aug 23 15:56 output-to-file.log
-rw-r--r--. 1 1234 root 14289 Aug 23 15:52 package-lock.json
-rw-r--r--. 1 1234 root   265 Aug 23 15:52 package.json
background: 6
background: 7
background: 8
background: 9
background: 10
background: 11
background: 12
background: 13
background: 14
background: 15
$ # Pod has fully terminated and preStop events have finished

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

@AObuchow
Copy link
Collaborator Author

I also verified that user-provided preStop events can't interfere with the async-storage-sidecar preStop event:

  • If a user creates a devworkspace using the async storage strategy and assigns a preStop event to a component named async-storage-sidecar without having a component named async-storage-sidecar, their devworkspace will fail during validation.
  • If a user uses the async storage strategy, creates a component named async-storage-sidecar and assigns a preStop event to it, their devworkspace will fail validation due to a duplicate container name (the internal async-storage-sidecar container and the user-defined async-storage-sidecar container component).

@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

Patch coverage: 90.76% and project coverage change: +0.38% 🎉

Comparison is base (65bdb9e) 52.53% compared to head (a81c3d7) 52.91%.
Report is 8 commits behind head on main.

❗ Current head a81c3d7 differs from pull request most recent head 52719a0. Consider uploading reports for the commit 52719a0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1168      +/-   ##
==========================================
+ Coverage   52.53%   52.91%   +0.38%     
==========================================
  Files          82       84       +2     
  Lines        7460     7544      +84     
==========================================
+ Hits         3919     3992      +73     
- Misses       3260     3269       +9     
- Partials      281      283       +2     
Files Changed Coverage Δ
pkg/library/container/container.go 46.66% <0.00%> (-3.34%) ⬇️
pkg/library/lifecycle/poststart.go 94.54% <ø> (-0.54%) ⬇️
pkg/library/lifecycle/prestop.go 94.54% <94.54%> (ø)
pkg/library/lifecycle/util.go 100.00% <100.00%> (ø)
webhook/workspace/handler/warning.go 100.00% <100.00%> (ø)

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@amisevsk amisevsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, I'll test it soon

Comment on lines 24 to 27
const redirectPreStopOutputFmt = `{
%s
} 1>/proc/1/fd/1 2>/proc/1/fd/2
`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a doc comment on the intention behind this?

Also, I'm concerned whether this format will suppress any messages on failure -- if a prestop command fails, normally the output is stored in a FailedPreStopHook event. If output is redirected to stdout, it might actually make it harder to figure out what went wrong in case of failure (if the event no longer contains this information)

I'm thinking about this because normally with workspaces, pods are removed and not stopped, so pod logs will disappear (this output would only be visible if you were tailing the logs as the workspace stopped)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a doc comment on the intention behind this?

Will add some documentation on the intention of the redirection (though it seems this is more of a docker feature than kubernetes standard).

Also, I'm concerned whether this format will suppress any messages on failure -- if a prestop command fails, normally the output is stored in a FailedPreStopHook event. If output is redirected to stdout, it might actually make it harder to figure out what went wrong in case of failure (if the event no longer contains this information)

I'm having trouble figuring out how to actually get terminated pod events to see the FailedPreStopHook event. From my understanding, once a pod terminates, there's no way to actually get information or events from it (without some type of logging system installed on the cluster).

For example, the following pod will create a FailedPreStopHook event, but you can't really catch it:

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","ls"]
      preStop:
        exec:
          command: ["/bin/sh","-c","badcommand"]

In my testing, I could only observe the FailedPreStopHook if I made the postStart command fail: the postStart event would fail, generating a FailedPostStartHook event, then the preStop hook would run, generating a FailedPreStopHook event, and then Kubernetes would restart the pod (eventually reaching a CrashLoopBackoff state).

I'm thinking about this because normally with workspaces, pods are removed and not stopped, so pod logs will disappear (this output would only be visible if you were tailing the logs as the workspace stopped)

This is true.

I think I need to experiment with a cluster logging/monitoring tool to see how the output redirection behaves, because it seems that under normal circumstances, it's impossible to see pod logs or events after pod termination

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a prestop command fails, normally the output is stored in a FailedPreStopHook event.

Oddly enough, I don't see failing postStart (or preStop) command output in the event, though the Kubernetes documentation mentions that it should appear.

I can confirm however that redirecting the command output will still result in a FailedPreStopHook warning event (though this wasn't your actual concern):

Events:
  Type     Reason               Age                From               Message
  ----     ------               ----               ----               -------
  Normal   Scheduled            13s                default-scheduler  Successfully assigned devworkspace-controller/lifecycle-demo to minikube
  Normal   Pulled               12s                kubelet            Successfully pulled image "nginx" in 422.082173ms (422.088026ms including waiting)
  Normal   Pulling              10s (x2 over 12s)  kubelet            Pulling image "nginx"
  Normal   Created              10s (x2 over 11s)  kubelet            Created container lifecycle-demo-container
  Normal   Started              10s (x2 over 11s)  kubelet            Started container lifecycle-demo-container
  Warning  FailedPostStartHook  10s (x2 over 11s)  kubelet            PostStartHook failed
  Normal   Killing              10s (x2 over 11s)  kubelet            FailedPostStartHook
  Warning  FailedPreStopHook    10s (x2 over 11s)  kubelet            PreStopHook failed
  Normal   Pulled               10s                kubelet            Successfully pulled image "nginx" in 464.96827ms (464.973595ms including waiting)
  Warning  BackOff              8s (x2 over 9s)    kubelet            Back-off restarting failed container lifecycle-demo-container in pod lifecycle-demo_devworkspace-controller(2067b4a6-9f64-4148-a2a5-d6483a315e4b

The above events came from the following pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","{ badcommand } 1>/proc/1/fd/1 2>/proc/1/fd/2"]
      preStop:
        exec:
          command: ["/bin/sh","-c","{ badcommand } 1>/proc/1/fd/1 2>/proc/1/fd/2"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the concern is that redirecting output means any logs from a failed hook don't get added to the event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forgot to mention: even if the output isn't redirected, I don't see it in the event when doing kubectl describe.

The following pod spec, which has a failing, non-redirected postStart command:

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","badcommand"]
      preStop:
        exec:
          command: ["/bin/sh","-c","{ badcommand } 1>/proc/1/fd/1 2>/proc/1/fd/2"]

Does not get its command output in the FailedPostStartHook event:

Events:
  Type     Reason               Age              From               Message
  ----     ------               ----             ----               -------
  Normal   Scheduled            5s               default-scheduler  Successfully assigned devworkspace-controller/lifecycle-demo to minikube
  Normal   Pulled               4s               kubelet            Successfully pulled image "nginx" in 589.651197ms (589.662446ms including waiting)
  Normal   Pulling              3s (x2 over 5s)  kubelet            Pulling image "nginx"
  Normal   Created              3s (x2 over 4s)  kubelet            Created container lifecycle-demo-container
  Normal   Started              3s (x2 over 4s)  kubelet            Started container lifecycle-demo-container
  Warning  FailedPostStartHook  3s (x2 over 4s)  kubelet            PostStartHook failed
  Normal   Killing              3s (x2 over 4s)  kubelet            FailedPostStartHook
  Warning  FailedPreStopHook    3s (x2 over 4s)  kubelet            PreStopHook failed
  Normal   Pulled               3s               kubelet            Successfully pulled image "nginx" in 460.949193ms (461.095788ms including waiting)
  Warning  BackOff              1s (x2 over 2s)  kubelet            Back-off restarting failed container lifecycle-demo-container in pod lifecycle-demo_devworkspace-controller(3ba486ed-64aa-48e8-a803-65c77c10b347)

This is on Minikube v1.30.1 using Kubernetes v1.26.3 on Docker 23.0.2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is contained in the message field of the event:

  • Without the redirect:
      message: |-
        Exec lifecycle hook ([/bin/sh -c {
        echo "hello"
        ls -al
        exit 1
    
        }
        ]) for Container "web-terminal" in Pod "workspace8f91680d98264907-54c9d766c4-pbzcw_dw(b52066eb-3bbf-4562-bc27-396267720503)" failed - error: command '/bin/sh -c {
        echo "hello"
        ls -al
        exit 1
    
        }
        ' exited with 1: , message: "hello\ntotal 16\ndrwxrwxr-x 1 root root 4096 May 23 18:40 .\ndrwxr-xr-x 1 root root 4096 May 23 18:40 ..\ndrwxrwxr-x 1 root root 4096 May 23 18:40 .config\n-rw-rw---- 1 root root  533 May 23 18:40 .viminfo\n"
    
  • With the redirect:
      message: |-
        Exec lifecycle hook ([/bin/sh -c {
        echo "hello"
        ls -al
        exit 1
    
        } 1>/proc/1/fd/1 2>/proc/1/fd/2
        ]) for Container "web-terminal" in Pod "workspace8f91680d98264907-c8c5f7c46-nczvc_dw(60ac330d-edcd-4c08-a566-3700173f847b)" failed - error: command '/bin/sh -c {
        echo "hello"
        ls -al
        exit 1
    
        } 1>/proc/1/fd/1 2>/proc/1/fd/2
        ' exited with 1: , message: ""

without the redirect, the message ends with

hello
total 16
drwxrwxr-x 1 root root 4096 May 23 18:40 .
drwxr-xr-x 1 root root 4096 May 23 18:40 ..
drwxrwxr-x 1 root root 4096 May 23 18:40 .config
-rw-rw---- 1 root root  533 May 23 18:40 .viminfo

You can view this via

kubectl get ev --field-selector reason==FailedPreStopHook -o yaml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the example and clarification @amisevsk. To me, this is enough of a reason to not use the 1>/proc/1/fd/1 2>/proc/1/fd/2 redirect.

pkg/library/lifecycle/prestop.go Show resolved Hide resolved
Copy link
Collaborator

@amisevsk amisevsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link

openshift-ci bot commented Aug 30, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, AObuchow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Fix devfile#1139

Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>
Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>
Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>
Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Aug 30, 2023

New changes are detected. LGTM label has been removed.

@AObuchow
Copy link
Collaborator Author

/retest

1 similar comment
@AObuchow
Copy link
Collaborator Author

/retest

@musienko-maxim
Copy link
Collaborator

/test v8-images

@musienko-maxim
Copy link
Collaborator

/test v8-devworkspace-operator-e2e

1 similar comment
@musienko-maxim
Copy link
Collaborator

/test v8-devworkspace-operator-e2e

@AObuchow
Copy link
Collaborator Author

/retest

3 similar comments
@AObuchow
Copy link
Collaborator Author

/retest

@AObuchow
Copy link
Collaborator Author

AObuchow commented Oct 3, 2023

/retest

@AObuchow
Copy link
Collaborator Author

/retest

@AObuchow AObuchow merged commit 50b8100 into devfile:main Oct 12, 2023
8 checks passed
@AObuchow AObuchow deleted the preStop-events branch October 12, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for devfile API events.preStop
3 participants